IN THE CLAIMS 



Claims 1-28. (Cancelled). 

29. (New) A machine-implemented method for multiplying a matrix [A] by a matrix of 
inputs [X] to obtain a matrix of outputs [Y], the method comprising: 

forming [A] as a matrix of predetermined values and multiplication operations, 
wherein the multiplication operations are selectively positioned into pairs within [A] to 
reduce the number of the multiplication operations upon factorization of [A]; 

factoring [A] into a butterfly matrix [B], a shuffle matrix [S], and a multiplication 
matrix [M]; 

grouping a set of values together within [M] for simultaneous execution by a 
processor instruction; and 

simultaneously executing multiplication operations on the grouped set of values 
using a Single Instruction Multiple Data (SIMD) instruction. 

30. (New) The machine-implemented method of claim 29, wherein the SIMD 
instruction is a Packed Multiply and Add (PMADDWD) instruction. 

31 . (New) The machine-implemented method of 30, wherein values within [B] and 
[S] are integers selected from the group consisting of 1 , 0 and -1 . 

32. (New) The machine-implemented method of claim 31 , wherein [A] is a 4-point 
Discrete Cosine Transform (DCT) transformation matrix, [X] represents a time domain 
of a video signal, and [Y] represents a frequency domain of the video signal. 
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33. (New) The machine-implemented method of claim 32, wherein the multiplication 
matrix [M] is 



1 1 

4l yfl 



1 

0 
0 



1 

0 
0 



0 
0 



0 
0 



.TV 

COS(y) COS(y) 

-cos(-) cos(-^-) 



1 



and wherein the grouped set of values are 
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34. (New) A machine-readable medium having instructions to cause a machine to 
perform a machine-implemented method for multiplying a matrix [A] by a matrix of 
inputs [X] to obtain a matrix of outputs [Y], the method comprising: 

forming [A] as a matrix of predetermined values and multiplication operations, 
wherein the multiplication operations are selectively positioned into pairs within [A] to 
reduce the number of the multiplication operations upon factorization of [A]; 

factoring [A] into a butterfly matrix [B], a shuffle matrix [S], and a multiplication 
matrix [M]; 

grouping a set of values together within [M] for simultaneous execution by a 
processor instruction; and 

simultaneously executing multiplication operations on the grouped set of values 
using a Single Instruction Multiple Data (SIMD) instruction. 

35. (New) The machine-readable medium of claim 34, wherein the SIMD instruction 
is a Packed Multiply and Add (PMADDWD) instruction. 
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36. (New) The machine-readable medium of claim 35, wherein values within [B] and 
[S] are integers selected from the group consisting of 1 , 0 and -1 . 



37. (New) The machine-readable medium of claim 36, wherein [A] is a 4-point 
Discrete Cosine Transform (DCT) transformation matrix, [X] represents a time domain 
of a video signal, and [Y] represents a frequency domain of the video signal. 



38. (New) The machine-readable medium of claim 37, wherein the multiplication 
matrix [M] is 
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and wherein the grouped set of values are 
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39. (New) A system comprising: 

a processing unit coupled to a memory through a bus; and 

a process for multiplying a matrix [A] by a matrix of inputs [X] to obtain a matrix of 
outputs [Y], the process executed from the memory by the processing unit to cause the 
processing unit to: 

form [A] as a matrix of predetermined values and multiplication operations, 
wherein the multiplication operations are selectively positioned into pairs within [A] to 
reduce the number of the multiplication operations upon factorization of [A]; 

factor [A] into a butterfly matrix [B], a shuffle matrix [S], and a multiplication 
matrix [M]; 
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group a set of values together within [M] for simultaneous execution by a 
processor instruction; and 

simultaneously execute multiplication operations on the grouped set of values 
using a Single Instruction Multiple Data (SIMD) instruction. 

40. (New) The system of claim 39, wherein the SIMD instruction is a Packed Multiply 
and Add (PMADDWD) instruction. 

41 . (New) The system of claim 40, wherein values within [B] and [S] are integers 
selected from the group consisting of 1 , 0 and -1 . 

42. (New) The system of claim 41 , wherein [A] is a 4-point Discrete Cosine 
Transform (DCT) transformation matrix, [X] represents a time domain of a video signal, 
and [Y] represents a frequency domain of the video signal. 

43. (New) The system of claim 42, wherein the multiplication matrix [M] is 
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and wherein the grouped set of values are 
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